11 research outputs found

    Fake Run-Time Selection of Template Arguments in C++

    Full text link
    C++ does not support run-time resolution of template type arguments. To circumvent this restriction, we can instantiate a template for all possible combinations of type arguments at compile time and then select the proper instance at run time by evaluation of some provided conditions. However, for templates with multiple type parameters such a solution may easily result in a branching code bloat. We present a template metaprogramming algorithm called for_id that allows the user to select the proper template instance at run time with theoretical minimum sustained complexity of the branching code.Comment: Objects, Models, Components, Patterns (50th International Conference, TOOLS 2012

    Parallel Solver of Large Systems of Linear Inequalities Using Fourier-Motzkin Elimination

    Get PDF
    Fourier-Motzkin elimination is a computationally expensive but powerful method to solve a system of linear inequalities. These systems arise e.g. in execution order analysis for loop nests or in integer linear programming. This paper focuses on the analysis, design and implementation of a parallel solver for distributed memory for large systems of linear inequalities using the Fourier-Motzkin elimination algorithm. We also measure the speedup of parallel solver and prove that this implementation results in good scalability

    Algorithm 947: Paraperm-parallel generation of random permutations with MPI

    Get PDF
    An algorithm for parallel generation of a random permutation of a large set of distinct integers is presented. This algorithm is designed for massively parallel systems with distributed memory architectures and the MPI-based runtime environments. Scalability of the algorithm is analyzed according to the memory and communication requirements. An implementation of the algorithm in a form of a software library based on the C++ programming language and the MPI application programming interface is further provided. Finally, performed experiments are described and their results discussed. The biggest of these experiments resulted in a generation of a random permutation of 241 integers in slightly more than four minutes using 131072 CPU cores

    Accelerating many-nucleon basis generation for high performance computing enabled ab initio nuclear structure studies

    Get PDF
    We present the problem of generating a many-nucleon basis in SU(3) -scheme for ab initio nuclear structure calculations in a symmetry-adapted no-core shell model framework. We first discuss and analyze the basis construction algorithm whose baseline implementation quickly becomes a significant bottleneck for large model spaces and heavier nuclei. The outcomes of this analysis are utilized to propose a new scalable version of the algorithm. Its performance is consequently studied empirically using the Blue Waters supercomputer. The measurements show significant acceleration achieved with over two orders of magnitude speedups realized for larger model spaces

    Block Iterators for Sparse Matrices

    Full text link

    Transformation of a nucleon-nucleon potential operator into its su(3) tensor form using GPUS

    Get PDF
    Starting from the matrix elements of a nucleon-nucleon potential operator provided in a basis of spherical harmonic oscillator functions, we present an algorithm for expressing a given potential operator in terms of irreducible tensors of the SU(3) and SU(2) groups. Further, we introduce a GPU-based implementation of the latter and investigate its performance compared with a CPU-based version of the same. We find that the CUDA implementation delivers speedups of 2.27x - 5.93x

    Efficient parallel evaluation of block properties of sparse matrices

    No full text

    SU3lib: A C++ library for accurate computation of Wigner and Racah coefficients of SU(3)

    No full text
    We present the C++ library SU3lib for accurate computation of SU(3) Wigner coupling and Racah recoupling coefficients. It is built on the efficient mathematical algorithm originally proposed by Draayer and Akiyama [1]. The presented library extends the reach of this algorithm towards large SU(3) irreducible representations and outer multiplicities that were heretofore inaccessible due to floating-point precision errors. As large irreducible representations of SU(3) play an important role in medium- and heavy-mass atomic nuclei, SU3lib expands the scope of approaches to nuclear structure and reactions that rely on available SU(3) coupling-recoupling coefficients. Program summary: Program Title: SU3lib CPC Library link to program files: https://doi.org/10.17632/j977v8v5fp.1 Developer\u27s repository link: https://gitlab.com/tdytrych/SU3lib Licensing provisions: BSD 2-clause Programming language: C++ External libraries: WIGXJPF [3], Boost Nature of problem: Accurate calculation of SU(3)⊃SO(3) and SU(3)⊃SU(2)×U(1) Wigner coupling and Racah recoupling coefficients for arbitrary couplings and multiplicity. Solution method: We adopt the mathematical procedure proposed by Draayer and Akiyama [1], who also provided its implementation as a FORTRAN library [2]. The challenge is to avoid the loss of precision due to cancellation in sums of large alternating terms in transformation between SU(3)⊃SO(3) and SU(3)⊃SU(2)×U(1) schemes, and to compute SU(3)⊃SU(2)×U(1) Wigner coefficients accurately for large outer multiplicities. The present library tackles these challenges by implementing key formulas and data structures as C++ templates and utilizing floating-point data types with extended precision provided by the Boost.Multiprecision library as template arguments. This permits an efficient and accurate computation of SU(3) coefficients even for large SU(3) irreps and outer multiplicities that were heretofore inaccessible. References: [1] J. P. Draayer and Y. Akiyama, J. Math. Phys. 14, 1904 (1973). [2] Y. Akiyama and J. P. Draayer, Comp. Phys. Comm. 5, 405 (1973). [3] H. T. Johansson and C. Forssén, SIAM J. Sci. Comput. 38(1), A376 (2016)

    Efficient algorithm for representations of U(3) in U(N)

    No full text
    An efficient algorithm for enumerating representations of U(3) that occur in a representation of the unitary group U(N) is introduced. The algorithm is applicable to U(N) representations associated with a system of identical fermions (protons, neutrons, electrons, etc.) distributed among the N=(η+1)(η+2)∕2 degenerate eigenstates of the ηth level of the three-dimensional harmonic oscillator. A C++ implementation of the algorithm is provided and its performance is evaluated. The implementation can employ OpenMP threading for use in parallel applications. Program summary: Program Title: UNtoU3.h Program files doi: http://dx.doi.org/10.17632/3g4w8f9vdk.1 Licensing provisions: MIT Programming language: C++ Nature of problem: The determination of the complete set of U(3) irreducible representations (irreps) that occurs in a representation of U(N), where N=(η+1)(η+2)∕2 is the degeneracy of the ηth harmonic oscillator shell. Solution method: The resulting set of U(3) irreps is determined by applying a simple difference relation to the U(3) weight distribution of the Gelfand basis states spanning a given U(N) irrep
    corecore